Before the lesson:
please make sure you got the latest RStudio and latest R version installed.
Lesson objectives:
* learn to perform a search in academic literature database * download results / import into R * making simple bibliometric networks and plots
Lesson outline:
* About this lesson
* Getting the data * Summarising the data * Bibliographic networks * More resouces
http://www.bibliometrix.org/documents.html
http://revtools.net/
This lesson is prepared for these who are already familiar with R coding language, R markdown and RStudio. By the end of this tutorial you should be able to create a simple html document containing markdown-formatted text, images and R code, all in R Studio.
You can do analyses of literature on any topic. In this lesson we will have a look at the academic literature related to the concept of Terminal Investment. Terminal Investment hypothesis predicts increased investment of resources into reproduction as the chances of survival decrease. This can be observed as increased reproductive effort in older animals or in animals challenged with factors signalling threat to their survival (e.g. predation, pathogenes, parasites).
Terminal investment in animals is usually studied in three main ways:
1. via observational studies of correlations of age and reproductive effort,
2. in experimantal studies where animals are subject to immune challenges and their subsequent reprodactive effort is compared to unchallenged aninmals of the same age,
3. in experimantal studies where reproductive response to immune challenge is compared between animals of older ages versus younger ages.
You can read more on this Wiki page: https://en.wikipedia.org/wiki/Terminal_investment_hypothesis
We hope the topic is quite appealing and quiet easy to understand.There are several published reviews on terminal investment hypothesis and we can expect many publications related to this topic, as well as many researchers working on it. Is this so?
Thus, we will try to run bibliometric analyses on the relevant sample of literature. Note that some R packages (and many other online/software tools) are available (and more are being developed) that can perform some of the tasks during this exercise, and often much more. For your own project you may want to try to use some of them, but there is no single “perfect” tool fit for all possible analyses and that is easy and usable for all disciplines and types of research questions. Note that the main purpose of this exercise is to familiarize you with the basic principles/issues of bibliometric analyses, you can always learn more in your own time if you are interested.
First we need to find a representative sample of academic publications on our topic of choice. For this, we will use cross-disciplianary database of academic literature, Scopus. This database has the largest coverage of the published literature and should give us the most complete picture.
Note that we have free access to this database on campus, but you will not be able to access it from outside the campus unels (you use UNSW or other unoversity proxy servers).
The alternative database, commonly used for borad academic literature searches and analyses, is Web of Science (https://www.webofknowledge.com/).
Press “Search”" button. You shoud see somethink like this:
Hey, this does not look good… - very few documents were found and some of them are completely unrelated (building shipping terminals).
Why is that?
This is becouse our search is too simple. It allows us only to find the papers that explicitly mention “terminal investment” phrase in their title, abstract or keywords. We need a more sophisticated approach to finding a better set of papers to analyse. Also, we will focus our topic a little bit more and aim to find papers that use immune challenge approach in wils or semi wild animal species (we should exclude established lab model species such as mice and rats, domesticated animals such as dogs and pigs, and humans). Finding the best search string is a bit of an art, so we just provide you with this one to save time:
(TITLE-ABS-KEY ( ( "terminal investment" OR "reproductive effort" OR "fecundity compensation" OR "reproductive compensation" OR "reproductive fitness" OR "reproductive investment" OR "reproductive success" OR "Life History Trade-Off*" OR "Phenotypic Plasticity" ) AND ( "immune challeng*" OR "immunochalleng*" OR "infect*" OR lipopolysaccharide OR lps OR phytohemagglutinin OR pha OR "sheep red blood cells" OR srbc OR implant OR vaccin* ) ) AND NOT TITLE-ABS-KEY ( load OR human OR people OR men OR women OR infant* OR rat OR rats OR mouse OR mice OR pig* OR pork OR beef OR cattle OR sheep OR lamb* OR chicken* OR calf* OR *horse* ))
You need to copy and paste the above search string into the Avanced Search tab of the Scopus Search page:
Press “Search”" button. You shoud see somethink like this:
There are over 1,000 records retrieved from the Scopus database (some look relevant and many are not, but thats always the case). On the left of the results window you can see simple filters: year, most common author names, subject areas, etc. You can explore the whole set roughly by using “Analyze search results” link above the table of the hits:
Next, we will export them for more detailed bibliometric analyses in R. To do so, close tha Scopus analyses window to go back to the list of records found. First select all records by clicking box “All” in the left top of the list of references. Then click the “Export” link to the right.
A pop-up window with the export options will appear. First, select the format of the export: Second, select which fields have to be exported by clicking the boxes on top of each column (or as needed). For bibliometric analyses on the citations among papers, it is essential to tick the box next to “Include references” (i.e. data on the cited documents).
Note that, unfortunately, Scopus limits number of exported records to 2000. For longer listes of records, you will need to split them in smaller chunks for the export and then merge into a single larger dataset (not covered in this tutorial. WoS export limits are 500 records.)
Click “Export” button. A file named “Scopus” (with extension matching your export type file e.g. ris) will be saved to your downloads folder.
Note that when you export references with their reference lists included in the records, the resulting files are quite large (in our case around 16Mb).
In case you did not succeed expoerting the files (or wish to work with exactly the same ones we used, or you cannot acces Scopus), the files downloaded on 27/05/2019 are provided in the /data subdirectory.
Create a new Rmarkdown file to save your code (you can do this within new RStudio project). Install and upload bibliometrix R package (you may also need readxl and tidyverse):
install.packages("bibliometrix", dependencies=TRUE) ### installs bibliometrix package and dependencies
library(bibliometrix) #uploads the package
#library(readxl)
#library(tidyverse)
#utput not displayed for this chunk
Next, Upload files exported from Scopus into RStudio. Then convert them into internal bibliometrix format.
tmp <- readFiles("data/scopus.bib")
bib <- convert2df(tmp, dbsource = "scopus", format = "bibtex") # Convert to a bibliometric data frame
#>
#> Converting your scopus collection into a bibliographic dataframe
#>
#> Articles extracted 100
#> Articles extracted 200
#> Articles extracted 300
#> Articles extracted 400
#> Articles extracted 500
#> Articles extracted 600
#> Articles extracted 700
#> Articles extracted 800
#> Articles extracted 900
#> Articles extracted 1000
#> Articles extracted 1100
#> Articles extracted 1167
#> Done!
#>
#>
#> Generating affiliation field tag AU_UN from C1: Done!
names(bib)
#> [1] "AU" "TI" "SO" "JI" "DT" "DT1"
#> [7] "DE" "ID" "AB" "C1" "RP" "CR"
#> [13] "TC" "PY" "DI" "UT" "DB" "AU_UN"
#> [19] "AU1_UN" "AU_UN_NR" "SR"
#write.csv(bib, "data/bib_as_df.csv", row.names = FALSE) #save as a data frame
After some processing, an object called “bib” is created. It contains a data frame with each row corresponsing to one exported publication from Scopus and with each column corresponsing to a field exported from Scopus online database. (Note, if you tried to achieve this by exporting a csv file directly from Scopus, you would get a meessy data frame, due to missing field values shifting the cells between columns).
What are the contents of the columns of our “bib” data frame? Columns are labelled with a two-letter tags: AU, TI, SO, JI, DT, DT1, DE, ID, AB, C1, RP, CR, TC, PY, DI, UT, DB, AU_UN, AU1_UN, AU_UN_NR, SR.
For a complete lost of field tags used in bibliometrix you can have a look at this file: http://www.bibliometrix.org/documents/Field_Tags_bibliometrix.pdf
Our data frame contains just a subset of these codes. Which ones?
Note that column bib$AU contains authors of each paper (as surenames and initials) separated by semicolon (;). We can split these strings and can extract a list of all author names to a vector:
head(bib$AU) #have a look at the few few records
#> [1] "SUEUR C;ROMANO V;SOSA S;PUGA-GONZALEZ I"
#> [2] "JOHNS S;HENSHAW JM;JENNIONS MD;HEAD ML"
#> [3] "GERVASI CL;LOWERRE-BARBIERI SK;VOGELBEIN WK;GARTLAND J;LATOUR RJ"
#> [4] "JUNGWIRTH A;BALZARINI V;ZÖTTL M;SALZMANN A;TABORSKY M;FROMMEN JG"
#> [5] "REYES-RAMÍREZ A;ENRÍQUEZ-VARA JN;ROCHA-ORTEGA M;TÉLLEZ-GARCÍA A;CÓRDOBA-AGUILAR A"
#> [6] "NWAOGU CJ;CRESSWELL W;VERSTEEGH MA;TIELEMAN BI"
authors <- bib$AU
authors <- unlist(strsplit(authors, ";")) #split the records into individual authors at ; and vectorize
authors <- authors[order(authors)] #order alphabetically
head(authors) #have a look again
#> [1] "ABBOTT J" "ABE A" "ABEDON ST" "ABO-SHEHADA M"
#> [5] "ABOUL-SOUD MAM" "ABRANTES N"
# View(unique(authors)) #use to see all the values
# write.csv(authors, "data/author_list_uncleaned.csv", row.names = FALSE) #you can save this in a file
Cited references for each inculded paper are in the CR column of the “bib” data frame. They are in a single string, also seperated by semicolon (;). We can have a look at them and check whether familiar names were cited, e.g.:
names(bib)
#> [1] "AU" "TI" "SO" "JI" "DT" "DT1"
#> [7] "DE" "ID" "AB" "C1" "RP" "CR"
#> [13] "TC" "PY" "DI" "UT" "DB" "AU_UN"
#> [19] "AU1_UN" "AU_UN_NR" "SR"
#bib$CR[1] #display a list of cited references for the first paper in the data frame (it is a long string!)
#look whether some of these names are cited:
grep("NAKAGAWA", bib$CR) #no
#> [1] 2 6 7 17 20 33 36 37 56 72 75 102 109 121 145 152 166
#> [18] 189 207 222 249 285 293 312 330 361 362 368 370 401 438 440 455 461
#> [35] 471 475 489 501 512 560 562 573 590 620 655 690 713 730 770 851 899
grep("CORNWELL, W.", bib$CR) #yes, 1 (still, this could be some other W. CORNWELL...)
#> [1] 15
Luckily, Bibliometrix package has a handy function that summarises the information contained in the “bib” data frame, so we can get some quick facts about our set of papers:
# Preliminary descriptive analysis
results <- biblioAnalysis(bib, sep = ";")
summary(object = results, k = 20, pause = TRUE)
#>
#>
#> Main Information about data
#>
#> Documents 1167
#> Sources (Journals, Books, etc.) 380
#> Keywords Plus (ID) 7484
#> Author's Keywords (DE) 3349
#> Period 1980 - 2019
#> Average citations per documents 27
#>
#> Authors 3922
#> Author Appearances 4728
#> Authors of single authored documents 61
#> Authors of multi authored documents 3861
#>
#> Documents per Author 0.298
#> Authors per Document 3.36
#> Co-Authors per Documents 4.05
#> Collaboration Index 3.57
#>
#> Hit <Return> to see next table:
#>
#> Annual Scientific Production
#>
#> Year Articles
#> 1980 1
#> 1981 2
#> 1983 1
#> 1984 1
#> 1986 2
#> 1987 2
#> 1988 4
#> 1990 3
#> 1991 2
#> 1992 3
#> 1993 11
#> 1994 8
#> 1995 8
#> 1996 10
#> 1997 17
#> 1998 20
#> 1999 14
#> 2000 20
#> 2001 22
#> 2002 19
#> 2003 29
#> 2004 28
#> 2005 29
#> 2006 46
#> 2007 37
#> 2008 42
#> 2009 44
#> 2010 53
#> 2011 66
#> 2012 74
#> 2013 90
#> 2014 98
#> 2015 87
#> 2016 73
#> 2017 100
#> 2018 73
#> 2019 28
#>
#> Annual Percentage Growth Rate 9.698031
#>
#> Hit <Return> to see next table:
#>
#> Most Productive Authors
#>
#> Authors Articles Authors Articles Fractionalized
#> 1 POULIN R 17 POULIN R 8.25
#> 2 MERINO S 9 ELENA SF 3.03
#> 3 MORENO J 9 HURD H 2.92
#> 4 SAKALUK SK 9 BENESH DP 2.83
#> 5 RANTALA MJ 8 MORET Y 2.62
#> 6 JOKELA J 7 ROY BA 2.58
#> 7 SORCI G 7 TSENG M 2.50
#> 8 ARRIERO E 6 WEBSTER JP 2.50
#> 9 ELENA SF 6 KOELLA JC 2.33
#> 10 HASSELQUIST D 6 TURNER PE 2.28
#> 11 MORET Y 6 MERINO S 2.24
#> 12 ROTH O 6 EBERT D 2.08
#> 13 SHELDON BC 6 MORENO J 2.08
#> 14 THOMAS F 6 JOKELA J 2.03
#> 15 WEDELL N 6 SCHMID-HEMPEL P 2.00
#> 16 WINGFIELD JC 6 WEDELL N 1.98
#> 17 ASHBY B 5 HEINS DC 1.92
#> 18 BENSCH S 5 ROTH O 1.92
#> 19 BUCKLING A 5 DAY T 1.83
#> 20 CÉZILLY F 5 THOMAS F 1.77
#>
#> Hit <Return> to see next table:
#>
#> Top manuscripts per citations
#>
#> Paper TC TCperYear
#> 1 FOLSTAD I, 1992, AMERICAN NATURALIST 1827 67.7
#> 2 SCHULZ B, 2005, MYCOL RES 719 51.4
#> 3 SCHRECK CB, 2001, AQUACULTURE 356 19.8
#> 4 BONNEAUD C, 2003, AM NAT 345 21.6
#> 5 NORDLING D, 1998, PROC R SOC B BIOL SCI 306 14.6
#> 6 GUSTAFSSON L, 1994, PHILOSOPHICAL TRANSACTIONS - ROYAL SOCIETY OF LONDON, B 300 12.0
#> 7 OTS I, 1998, FUNCT ECOL 286 13.6
#> 8 GARCIADELEANIZ C, 2007, BIOL REV 252 21.0
#> 9 SPRENT JI, 2007, NEW PHYTOL 244 20.3
#> 10 LOVE OP, 2005, AM NAT 225 16.1
#> 11 JOHNSON WE, 2010, SCIENCE 222 24.7
#> 12 MARZAL A, 2005, OECOLOGIA 215 15.4
#> 13 ILMONEN P, 2000, PROC R SOC B BIOL SCI 203 10.7
#> 14 MCGRAW EA, 2002, PROC NATL ACAD SCI U S A 198 11.6
#> 15 CASTO JM, 2001, AM NAT 192 10.7
#> 16 JOHNSTON IA, 2006, J EXP BIOL 183 14.1
#> 17 ROHR JR, 2010, ENVIRON HEALTH PERSPECT 173 19.2
#> 18 ASGHAR M, 2015, SCIENCE 166 41.5
#> 19 VELANDO A, 2006, PROC R SOC B BIOL SCI 164 12.6
#> 20 BENSCH S, 2007, J ANIM ECOL 151 12.6
#>
#> Hit <Return> to see next table:
#>
#> Most Productive Countries (of corresponding authors)
#>
#> Country Articles Freq SCP MCP MCP_Ratio
#> 1 USA 267 0.28465 209 58 0.217
#> 2 UNITED KINGDOM 101 0.10768 64 37 0.366
#> 3 FRANCE 73 0.07783 52 21 0.288
#> 4 CANADA 52 0.05544 38 14 0.269
#> 5 GERMANY 52 0.05544 28 24 0.462
#> 6 SPAIN 50 0.05330 28 22 0.440
#> 7 FINLAND 35 0.03731 19 16 0.457
#> 8 SWEDEN 29 0.03092 17 12 0.414
#> 9 SWITZERLAND 28 0.02985 16 12 0.429
#> 10 AUSTRALIA 20 0.02132 16 4 0.200
#> 11 NEW ZEALAND 20 0.02132 12 8 0.400
#> 12 BRAZIL 16 0.01706 9 7 0.438
#> 13 CHINA 15 0.01599 10 5 0.333
#> 14 NORWAY 14 0.01493 8 6 0.429
#> 15 JAPAN 13 0.01386 11 2 0.154
#> 16 ITALY 11 0.01173 6 5 0.455
#> 17 ARGENTINA 10 0.01066 8 2 0.200
#> 18 NETHERLANDS 10 0.01066 4 6 0.600
#> 19 BELGIUM 9 0.00959 3 6 0.667
#> 20 INDIA 9 0.00959 7 2 0.222
#>
#>
#> SCP: Single Country Publications
#>
#> MCP: Multiple Country Publications
#>
#> Hit <Return> to see next table:
#>
#> Total Citations per Country
#>
#> Country Total Citations Average Article Citations
#> 1 USA 7924 29.68
#> 2 UNITED KINGDOM 4566 45.21
#> 3 FRANCE 2296 31.45
#> 4 NORWAY 2180 155.71
#> 5 GERMANY 1831 35.21
#> 6 SWEDEN 1653 57.00
#> 7 CANADA 1402 26.96
#> 8 SPAIN 1123 22.46
#> 9 FINLAND 1078 30.80
#> 10 SWITZERLAND 1031 36.82
#> 11 AUSTRALIA 621 31.05
#> 12 ESTONIA 440 110.00
#> 13 NEW ZEALAND 410 20.50
#> 14 ITALY 303 27.55
#> 15 MEXICO 288 36.00
#> 16 NETHERLANDS 221 22.10
#> 17 POLAND 215 23.89
#> 18 CHINA 203 13.53
#> 19 GEORGIA 197 24.62
#> 20 JAPAN 185 14.23
#>
#> Hit <Return> to see next table:
#>
#> Most Relevant Sources
#>
#> Sources Articles
#> 1 JOURNAL OF EVOLUTIONARY BIOLOGY 41
#> 2 PROCEEDINGS OF THE ROYAL SOCIETY B: BIOLOGICAL SCIENCES 40
#> 3 EVOLUTION 35
#> 4 PARASITOLOGY 35
#> 5 OECOLOGIA 32
#> 6 PLOS ONE 30
#> 7 AMERICAN NATURALIST 25
#> 8 FUNCTIONAL ECOLOGY 24
#> 9 BMC EVOLUTIONARY BIOLOGY 22
#> 10 BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY 20
#> 11 JOURNAL OF ANIMAL ECOLOGY 19
#> 12 OIKOS 17
#> 13 ANIMAL BEHAVIOUR 15
#> 14 BEHAVIORAL ECOLOGY 15
#> 15 ECOLOGY AND EVOLUTION 14
#> 16 INTERNATIONAL JOURNAL FOR PARASITOLOGY 13
#> 17 CANADIAN JOURNAL OF ZOOLOGY 11
#> 18 EVOLUTIONARY ECOLOGY 11
#> 19 HORMONES AND BEHAVIOR 11
#> 20 JOURNAL OF PARASITOLOGY 11
#>
#> Hit <Return> to see next table:
#>
#> Most Relevant Keywords
#>
#> Author Keywords (DE) Articles Keywords-Plus (ID) Articles
#> 1 PHENOTYPIC PLASTICITY 67 FEMALE 469
#> 2 REPRODUCTION 45 MALE 405
#> 3 FITNESS 41 ARTICLE 382
#> 4 REPRODUCTIVE SUCCESS 39 REPRODUCTION 334
#> 5 LIFE HISTORY 33 PHYSIOLOGY 296
#> 6 IMMUNITY 31 NONHUMAN 285
#> 7 TRADE-OFF 31 GENETICS 224
#> 8 PARASITE 28 ANIMAL 210
#> 9 VIRULENCE 27 REPRODUCTIVE SUCCESS 210
#> 10 WOLBACHIA 26 CONTROLLED STUDY 185
#> 11 TERMINAL INVESTMENT 23 GENETIC FITNESS 160
#> 12 DISEASE 22 FITNESS 157
#> 13 PARASITISM 22 PHENOTYPIC PLASTICITY 154
#> 14 FECUNDITY 19 PHENOTYPE 149
#> 15 RESISTANCE 19 PRIORITY JOURNAL 145
#> 16 SEXUAL SELECTION 19 REPRODUCTIVE FITNESS 144
#> 17 LIFE-HISTORY TRADE-OFFS 18 ANIMALS 137
#> 18 TESTOSTERONE 17 MICROBIOLOGY 136
#> 19 INFECTION 16 HOST PARASITE INTERACTION 119
#> 20 PARASITES 16 HOST-PARASITE INTERACTION 119
Using summary function on bibliometrix results, we get several screens with various tables summarising bibliometric data from our data frame - how many documents, journals, keywords, authords, publicaton timespan, collaboration index, annual publication growth rate, most prolific authors, publications per country, per journal, per keywords, etc.
You can plot five of these tables (hit “return”" to displey next graph, and later you can use arrows in the top left of the plots pane to move back and forth between consequtive graphs saved in the RStudio memory):
plot(results, k = 20, pause=TRUE) #this takes top 20 values from each plottable table
#> Hit <Return> to see next plot:
#> Hit <Return> to see next plot:
#> Hit <Return> to see next plot:
#> Hit <Return> to see next plot:
#> Hit <Return> to see next plot:
#the code below is for saving these plots into a pdf
# pdf(file = "plots/bib_descriptive_plots.pdf", height = 8, width = 8, pointsize=10) #
# plot(results, k = 20, pause=FALSE) #this takes top 20 values from each plottable table
# dev.off()
The cited papers from the CR field of the data frame can be analysed using function citations.
Function citations makes it easy to generate the frequency tables of the most cited papers or the most cited first authors from the reference lists of our papers downloaded from Scopus.
Ten most cited papers:
mostcited <- citations(bib, field = "article", sep = ";")
cbind(mostcited$Cited[1:10])
#> [,1]
#> LOCHMILLER, RL, DEERENBERG, C, TRADE-OFFS IN EVOLUTIONARY IMMUNOLOGY: JUST WHAT IS THE COST OF IMMUNITY? (2000) OIKOS, 88, PP 87-98 46
#> SHELDON, BC, VERHULST, S, ECOLOGICAL IMMUNOLOGY: COSTLY PARASITE DEFENCES AND TRADE-OFFS IN EVOLUTIONARY ECOLOGY (1996) TRENDS ECOL EVOL, 11, PP 317-321 35
#> HAMILTON, WD, ZUK, M, HERITABLE TRUE FITNESS AND BRIGHT BIRDS: A ROLE FOR PARASITES? (1982) SCIENCE, 218, PP 384-387 32
#> MORET, Y, SCHMID-HEMPEL, P, SURVIVAL FOR IMMUNITY: THE PRICE OF IMMUNE SYSTEM ACTIVATION FOR BUMBLEBEE WORKERS (2000) SCIENCE, 290, PP 1166-1168 30
#> FORBES, MRL, PARASITISM AND HOST REPRODUCTIVE EFFORT (1993) OIKOS, 67, PP 444-450 29
#> STEARNS, SC, (1992) THE EVOLUTION OF LIFE HISTORIES, , OXFORD UNIVERSITY PRESS, OXFORD 26
#> FRANK, SA, MODELS OF PARASITE VIRULENCE (1996) Q REV BIOL, 71, PP 37-78 24
#> MINCHELLA, DJ, HOST LIFE-HISTORY VARIATION IN RESPONSE TO PARASITISM (1985) PARASITOLOGY, 90, PP 205-216 24
#> MINCHELLA, DJ, LOVERDE, PT, A COST OF INCREASED EARLY REPRODUCTIVE EFFORT IN THE SNAIL BIOMPHALARIA GLABRATA (1981) AM NAT, 118, PP 876-881 23
#> ROLFF, J, SIVA-JOTHY, MT, INVERTEBRATE ECOLOGICAL IMMUNOLOGY (2003) SCIENCE, 301, PP 472-475 18
Ten most cited authors:
mostcited <- citations(bib, field = "author", sep = ";")
cbind(mostcited$Cited[1:10])
#> [,1]
#> WINGFIELD, JC 425
#> POULIN, R 410
#> MØLLER, AP 383
#> HASSELQUIST, D 326
#> SCHMID-HEMPEL, P 326
#> READ, AF 306
#> ZUK, M 281
#> EBERT, D 275
#> SHELDON, BC 263
#> BENSCH, S 229
The function localCitations generates the frequency table of the locally most cited authors. Locally, means that only citations are counted only within the given data set - i.e. how many times an author/paper that is in this data set has been cited by other authors/papers also in the data set.
Ten most frequent local cited authors and papers:
mostcited <- localCitations(bib, sep = ";")
#> Articles analysed 100
#> Articles analysed 200
#> Articles analysed 300
#> Articles analysed 400
#> Articles analysed 500
#> Articles analysed 600
#> Articles analysed 700
#> Articles analysed 800
#> Error in grep(x, M$CR[M$PY >= Year]): invalid regular expression, reason 'Invalid character range'
mostcited$Authors[1:10,]
#> NULL
mostcited$Papers[1:10,]
#> NULL
So far, we looked only at the numbers - who or what gets cited most, either from the main papers list or from the lists of the references within these papers. Now it is time to look at the actual networks of citations and also other types of networks that can be created using our data set.
To do so we will be creating various rectangular matrices which reflect connections of different attributes of Papers/Authors. These matrices than can be plotted as bipartite networks and analysesd.
Co-citation or coupling networks are a special type of newtorks resulting from scientific papers containing references to other scientific papers.
Package bibliometrix contains function biblioNetwork which makes creating bibliomgraphic networks easy. This function can create the most frequently used coupling networks: Authors, Sources, and Countries.
Bibliographic coupling - two articles are bibliographically coupled if they share at leas one reference from their reference lists (i.e. at least one cited source appears in the reference lists/bibliographies of both papers (Kessler, 1963).
NetMatrix <- biblioNetwork(bib, analysis = "coupling", network = "references", sep = ";")
net=networkPlot(NetMatrix, normalize = "salton", weighted=NULL, n = 10, Title = "Papers' bibliographic coupling", type = "fruchterman", size=5, size.cex=T, remove.multiple=TRUE, labelsize=0.5, label.cex=F)
Above, we plotted only the top 10 most coupled papers (n=10), try increasing this number to 100 (would not recommend further increasing the number of displayed nodes - it gets slow and messy).
What happens and why?
Author’s bibliographic coupling - two authors are bibliographically coupled if they share at leas one reference form their reference lists.
NetMatrix <- biblioNetwork(bib, analysis = "coupling", network = "authors", sep = ";")
net=networkPlot(NetMatrix, normalize = "salton", weighted=NULL, n = 10, Title = "Authors' bibliographic coupling", type = "fruchterman", size=5,size.cex=T,remove.multiple=TRUE,labelsize=0.8, label.n=10, label.cex=F)
Above, we plotted only the top 10 most coupled authors (n=10), try increasing this number to 100 (would not recommend further increasing the number of displayed nodes - it gets slow and messy).
What happens and why?
Bibliographic co-citation is kind of opposite to bibliographic coupling, in so that two papers are linked by co-citatio when both are cited in a third papers.
NetMatrix <- biblioNetwork(bib[1:50,], analysis = "co-citation", network = "references", sep = ";")
net=networkPlot(NetMatrix, normalize = "salton", weighted=NULL, n = 50, Title = "Papers' co-citations", type = "fruchterman", size=5, size.cex=T, remove.multiple=TRUE, labelsize=0.5, label.cex=F)
Note that for creating this matrix we only used first 50 papers from our data set - this is because the resulting matrix is a matrix of ALL cited papers and it gets HUGE). Also, we plotted only the top 10 most coupled papers (n=10), try increasing this number to 50 (would not recommend further increasing the number of displayed nodes - it gets slow and messy).
What happens and why?
Bibliographic collaboration is a network where nodes are authors and links are co-authorships on the papers.
NetMatrix <- biblioNetwork(bib, analysis = "collaboration", network = "authors", sep = ";")
net=networkPlot(NetMatrix, normalize = "salton", weighted=NULL, n = 10, Title = "Authors' collaborations", type = "fruchterman", size=5, size.cex=T, remove.multiple=TRUE, labelsize=0.5, label.cex=F)
Above, we plotted only the top 10 most collaborating authors (n=10), try increasing this number to 100 (would not recommend further increasing the number of displayed nodes - it gets slow and messy).
What happens and why?
Country Scientific Collaboration - we can visualise authors from which countries publish papers together most frequently.
bib <- metaTagExtraction(bib, Field = "AU_CO", sep = ";") #we need to extrcat countries from the affiliations first
NetMatrix <- biblioNetwork(bib, analysis = "collaboration", network = "countries", sep = ";")
net=networkPlot(NetMatrix, n = 100, Title = "Country Collaboration", type = "auto", size=TRUE, remove.multiple=FALSE, labelsize=0.5)
Above, we plotted only the top 10 most collaborating countrie (n=10), try increasing this number to 100 (would not recommend further increasing the number of displayed nodes - it gets slow and messy).
What happens and why?
Keyword co-occurrences - we can also visualise which papers share most keywords (from Scopus database).
NetMatrix <- biblioNetwork(bib, analysis = "co-occurrences", network = "keywords", sep = ";")
net=networkPlot(NetMatrix, n = 50, Title = "Keyword co-occurance", type = "fruchterman", size=T, remove.multiple=FALSE, labelsize=0.7, edgesize = 5)
try replacing network = “keywords” with network = “author_keywords” and see what happens. You can also try to display fewer/more keywords in the plot.
Co-Word Analysis - uses the word co-occurrences in a bibliographic collection to map the conceptual structure of research. It works via a separate function conceptualStructure that creates a conceptual structure map of a scientific field performing Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA) or Metric Multidimensional Scaling (MDS) and Clustering of a bipartite network of terms extracted from keyword, title or abstract fields of the data frame.
CS <- conceptualStructure(bib, field="ID", method="CA", minDegree=4, k.max=8, stemming=FALSE, labelsize=10, documents=10)
The code above uses field ID, which stands for “conceptualStructure”. Try using authors keywords, “DE” field, instead. Is the map different?
Historical Direct Citation Network - represents a chronological network map of most relevant direct citations in a bibliographic collection, i.e who is citing whom and in what order. histNetwork function calculates a chronological direct citation network matrix which then is plotted using histPlot:
#options(width=130)
histResults <- histNetwork(bib, min.citations = 10, sep = ";")
#> Articles analysed 100
#> Articles analysed 200
#> Articles analysed 300
#> Articles analysed 400
#> Articles analysed 500
#> Articles analysed 600
#> Articles analysed 656
net <- histPlot(histResults, n=15, size = 20, labelsize=10, size.cex=TRUE, arrowsize = 0.5, color = TRUE)
#>
#> Legend
#>
#> Paper
#> 1992 - 15 FOLSTAD I, 1992, AMERICAN NATURALIST
#> 1994 - 26 GUSTAFSSON L, 1994, PHILOSOPHICAL TRANSACTIONS - ROYAL SOCIETY OF LONDON, B
#> 1997 - 62 SIIKAMÄKI P, 1997, FUNCT ECOL
#> 1997 - 63 ALLANDER K, 1997, FUNCT ECOL
#> 1998 - 72 NORDLING D, 1998, PROC R SOC B BIOL SCI
#> 2000 - 100 ILMONEN P, 2000, PROC R SOC B BIOL SCI
#> 2002 - 143 AHMED AM, 2002, OIKOS
#> 2003 - 167 BONNEAUD C, 2003, AM NAT
#> 2004 - 194 JACOT A, 2004, EVOLUTION
#> 2004 - 199 BONNEAUD C, 2004, EVOLUTION
#> 2005 - 219 MARZAL A, 2005, OECOLOGIA
#> 2006 - 246 VELANDO A, 2006, PROC R SOC B BIOL SCI
#> 2008 - 308 MARZAL A, 2008, J EVOL BIOL
#> 2009 - 346 KNOWLES SCL, 2009, FUNCT ECOL
#> 2010 - 387 KNOWLES SCL, 2010, J EVOL BIOL
#> DOI Year LCS GCS
#> 1992 - 15 10.1086/285346 1992 35 1827
#> 1994 - 26 10.1098/RSTB.1994.0149 1994 26 300
#> 1997 - 62 10.1046/J.1365-2435.1997.00075.X 1997 14 47
#> 1997 - 63 10.1046/J.1365-2435.1997.00095.X 1997 14 66
#> 1998 - 72 10.1098/RSPB.1998.0432 1998 31 306
#> 2000 - 100 10.1098/RSPB.2000.1053 2000 17 203
#> 2002 - 143 10.1034/J.1600-0706.2002.970307.X 2002 14 109
#> 2003 - 167 10.1086/346134 2003 34 345
#> 2004 - 194 10.1111/J.0014-3820.2004.TB01603.X 2004 17 105
#> 2004 - 199 10.1111/J.0014-3820.2004.TB01633.X 2004 20 119
#> 2005 - 219 10.1007/S00442-004-1757-2 2005 21 215
#> 2006 - 246 10.1098/RSPB.2006.3480 2006 16 164
#> 2008 - 308 10.1111/J.1420-9101.2008.01545.X 2008 15 137
#> 2009 - 346 10.1111/J.1365-2435.2008.01507.X 2009 16 123
#> 2010 - 387 10.1111/J.1420-9101.2009.01920.X 2010 13 136
MORE You can use different types of network plots - just tweak “type” parameter in the networkPlot function (check the vignette for the available options). Type indicates the network map layout: circle, kamada-kawai, mds, etc.
You can use non-R tools to visualise bibliographic networks, e.g. VOSviewer software by Nees Jan van Eck and Ludo Waltman (http://www.vosviewer.com). When in R function you usetype=“vosviewer”, the function will export the network a standard “pajek” network file (named “vosnetwork.net”), which can be used in other network-plotting software, including VOSviewer.